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ABSTRACT 


Three  methods  for  applying  the  EM  algorithm  to  censored  data  are 
considered,  the  Buckley- James  (1979),  a  proposed  simpler  nonparametric 
method  and  a  normal  model  for  censored  data.  A  new  estimator  for  the 
variance  of  y  in  the  Buckley-J;unes  model  is  proposed  and  simulations 
comparing  the  three  methods  are  described.  To  illustrate  the  use  of 
these  methods  they  arc  applied  to  the  Stanford  heart  transplant  data. 

Key  Words:  Censored  data;  EM  algorithm;  hi near  regression;  Kaplan - 
Meier  estimator. 


I.  Introduction 

There  have  been  many  methods  proposed  for  handling  regression  problems 
in  which  the  dependent  variable  may  be  censored.  While  several  of  these 
methods  assume  an  underlying  distributional  form,  that  is,  noimal,  Weibull 
or  exponential,  others  are  of  a  nonparametric  nature  and  require  minimal 
assumptions  about  the  unspecified  distribution. 

One  of  these  techniques,  developed  by  Cox  (1972),  assumes  that  the  hazard 
function  A(y,x)  =  f(y,x)/(l-F(y,x))  has  the  form 

A(y,x)  *  A0(y)exe 

where  XQ(y)  is  the  underlying  hazard.  He  then  used  conditional  arguments 
to  form  a  partial  likelihood  function  (Cox,  1972,  1975) ,  independent  of 
AQ(y),  for  the  estimation  of  0. 

>  Wet  will  focus  on  three  methods,  which  are  based  on  the  linear  model 

T 

y  *  x  0  +  e. 

Two  of  the  methods  considered  here  are  nonparametric  in  that  the  distribution 
of  y  is  unspecified  while  the  third  assumes  that  y  is  normally  distributed. 

These  methods  rely  on  the  expectation  maximization  (EM)  algorithm,  introduced 
by  Dempster,  Laird  and  Rubin  (1977),  which  is  broadly  applicable  for  computing 
estimates  from  incomplete  data  and  is  used  for  the  estimation  of  0  and  the 
variance  of  y  by  all  three  methods  considered  here.  Although  their  presentation 
was  based  on  a  parametric  model,  the  EM  algorithm  has  been  applied  to 
nonparametric  models  (Buckley  and  James,  1979).  In  section  2  we  discuss  each  of 
these  methods  and  propose  a  new  estimator  of  the  variance  of  y  for  the  Buckley- 
James  method.  In  section  3  we  present  results  from  a  simulation  study 
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which  compares  the  performance  of  the  three  methods.  Finally,  in  section  4 
we  present  an  example  using  data  from  the  Stanford  Heart  transplant  study. 


2.  Estimation  Techniaues  for  Linear  Regression  with  Censored  Data 


We  consider  the  linear  model 


Yj  *  y  +  Ej  j=l,...f  n 


where  the  e.  are  independent  and  identically  distributed  with  the  distribution 

T 

function  F  which  has  mean  zero  and  finite  variance.  The  covariate  vector  Xj 

is  k  dimensional  with  x  .  =  1. 

oj 

We  assume  here  that  the  observed  times  are  generated  by  random  censorship. 

Let  the  life  times  y^ , . . . ,  y^  be  i.i.d.  with  distribution  function  F  as  specified 
above  and  let  the  censoring  times  C^, . . . ,  Cn  be  i.i.d.  with  distribution 
function  G  and  further,  under  the  assumption  of  random  censorship,  let  Cj,...,  Cn 
be  independent  of  y^, . . . ,  yn- 


We  observe 


Zj  =  minfY^Cj) 


1  if  Yj  fC. 
0  if  Yj  >  C. 


n  =  -‘■1  6. 
u  j»l  J 


For  Type  I  censorship,  a  special  case  of  randan  censorship,  the  Cj 's  are  given 
constants. 


v 
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Let  X  be  the  design  matrix  and  define  the  vector  y*(B)  by 


y<(8)  *  fijyj  ♦  (l-fij)  e[Yj  y.  >  cjf  x*e] 


Tiien  the  EM  estimate  of  3  is  the  solution  to 


A  rp  ■<  rp  A 

3  =  (X^'Vy*  (8) . 


This  solution  requires  an  iterative  procedure,  since  E(Yj  Yj  >  Cj ,  x*3) 

is  unknown  and  has  to  be  estimated. 

2 

Hie  variance  o  can  be  estimated  by 


m  a-  -  (y*(6)  -  X3)T  A  (y*(3)  -  X8) 


where  m  is  an  appropriate  constant  and  A  is  a  diagonal  matrix.  It  is  not 

~  2 

immediately  evident  what  to  substitute  in  (7)  for  yt (3)  when  y^  is  censored. 
Aitkin  (1981),  who  considers  the  case  where  F  is  normal,  uses  the  maximum 
likelihood  approach  to  derive  the  form 

yj(8)2  -  6^*  +  (l-«5j)  ETYj2  Yj  >  Cj,xj3]  (8) 

with  m  ■  n  and  A  *  I,  the  identity  matrix.  He  suggests  using  the  bias 


corrected  estimate 


2  2  °U 
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Schmec  and  Hahn  (1979),  who  also  consider  the  normal  model  suggest 


yj(8)2  -  yj2  *  O-y  ErVj  |  Yj  >  Cj.iJej2 


with  m  ■  n-k  and  A  ■  I.  But  unfortunately,  this  approach  results  in  an 

7 

underestimation  of  a  which  becomes  severe  for  moderate  and  heavy  censoring. 


-4- 


A  2 

Consequently,  the  estimate  8  suffers  from  poor  estimation  of  o  .  Since  the 
results  from  the  maximum  likelihood  estimator  are  better  for  moderate  cen¬ 
soring  we  shall  not  Teport  the  results  for  the  Schmee  and  Hahn  estimator. 

Buckley  and  James  (1979)  consider  the  nonparametric  case  where  the  under¬ 
lying  distribution  F  is  unspecified.  They  suggest  that  the  censored  obser- 

2 

vations  be  ignored  for  the  estimation  of  o  and  use 

y]2  *  5jyj2*  A  =  diagC^j),  m  *  nu-k  (11) 

but  introduce  the  correction 

aBJsa  ~  ^  ^(Xj-x^/n/.  (12) 


We  propose  a  nonparametric  estimator  of  o  analogous  to  that  proposed  by 
Aitken  (1981)  in  equation  (8).  In  fact,  the  simulation  studies  we  performed 
suggest  that  this  estimator  of  a  has  a  smaller  bias  and  mean  square  error 
(MSE)  than  the  estimator  proposed  by  Buckley  and  James. 

In  order  to  apply  the  01  algorithm  to  equation  (6)  for  the  estimation 


of  8  we  have  to  establish  appropriate  estimators  for  E(Y  • 


Y  >  C.,  x-B]. 
J  J 


In  the  parametric  case  it  is  easy  to  find  explicit  expressions  for  this 
expectation  and  replace  it  by  an  appropriate  estimate.  In  the  normal  model, 
where  F  ■  <f>,  we  obtain  (Aitkin  (1981)), 


EfYJ 


Yj  >  Cj,3?e1"Xj0  +  11  W^Cj'X)3)  1  °) 


(13) 


and 


E[Yj 2  |Y.  >  Cj.xTgl -(xjs)2  +d2  ♦  ofC^xTs)  W((Cj-xT8)/a5 


(14) 


where  W(u)  »  <h(u)  /  (l-  v  (u))  is  the  hazard  rate  of  the  normal  distribution. 
In  order  to  obtain  an  estimate  foT  the  latter  two  expectations  we  replace 
B  and  o  by  their  current  estimates. 
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Buckley  and  James,  who  consider  the  case  where  the  underlying  distribution 
function  F  is  unknown,  estimate  F  using  the  product  limit  estimator  (Kaplan  and 
Meier,  1958),  „  (  d, 

l1  "(i) 


1  -  FC.,.0  -  <e  fi  -  I6i 

3  ~  emle  1  n<vi  J 


where  e^  <  e^  < 


.  <  e,  are  the  ordered  values  of  the  residuals 
(n) 


A  |  ,  A 

e^(o,B)  =  Z^-x^n^  is  the  number  at  risk  at  e^-  and  d^  denotes  the 


(O 


(i) 


number  of  failures  at  The  Kaplan-Meier  estimator  is  then  used  to 


estimate  E[Y. 

J 

are  replaced  by 


Yj  >  C ,  x^B)  in  the  following  manner.  Uncensored  observations 


EtYj 


'P  'T/v  U  /v  T' 

Yj >  cj-  -  y}  ■  y  ♦  i 


(15) 


where  the  sum  is  over  the  set  of  uncensored  residuals  and 


v'«  ■ 


vk(8)/{1  -  F(C£-xj3)>  when  e^fo.B)  <  eK(o,B) 
0  otherwise 


(16) 


where  v^(8)  is  the  mass  of  the  Kaplan-Meier  estimator  at  the  uncensored  points. 

A 

In  this  case  the  Kaplan-Meier  estimator  F(e,3)  assigns  the  remaining  mass  to 
the  largest  residual  if  it  is  censored. 

2 

We  propose  to  estimate  the  variance  o  by  applying  the  above  reasoning  to 
the  computation  of 


rPA  1  H  ^  A  rp/\  a  /*  fpA 

(y;  (6)  -  xjs)2  =  l  (yf  (B)  +  (xr3)z  -  2y?(B)xL) 
j=l  J  J  j=i  J  J  J  J 

A 

Replacing  the  yj  (B)  by  their  estimated  expectations  we  have 


(17) 


*  l  Jify-  (6)]  +  (x  3) 2  -  2E(yt (B)]x . 3 
j*l  J  J  J  J 


(18) 


and  from  (15)  we  see  that  for  censored  observations 


E[yJ2(6)]  =  l?[(yj(0)  -xy+  2yt(6)x]e-(xTe)2] 

U  A  rp A  1  mA  a  rr*A  1 

=  l  wjK(6)  (yK-x^r  +  2xje  ECyJ(3))-CXj6)  , 


(19) 


and 


E[yt2(*n  +  (xT$)2  -  2x]g  E[yt(3)J  -  f  wjlc(B)  (yK-x*3)2 
substituting  into  equation  (18)  we  have 


This  estimator,  unlike  the  Buckley- James  estimator,  uses  the  information  in 
both  the  censored  and  uncensored  observations  for  the  estimation  of  o.  To 
reduce  bias  we  applied  the  bias  correction 


o„_.  r  =  a 
new 


'2  . 


V°v 


(20) 


It  is  easy  to  see  that  the  Buckley- James  procedure  becomes  quite  complicated, 
particularly  when  a  large  number  of  observations  are  tied.  Therefore,  we  also 

consider  a  modified  method  which  is  very  easy  to  implement.  We  propose 
estimating  the  expectation  with 


Y: > 


cj>*> 


xTb  +  l  e(i)/f 
J  c(i)>ej 


(21) 


where  l  is  the  number  of  r  (?')  >  cj.  This  is  a  simple  average  of  all  censored 

and  uncensored  residuals  e.  «  y*(0)  -  a (3  exceeding  e.  «  C-  -  xT3.  ^is 

11  1  J  J  J 

method,  which  is  in  spirit  of  the  algorithm,  replaces  censored  observations 
by  their  expectations  and  treats  them  as  uncensored  observations. 


To  estimate  the  covariance  matrix  of  0  in  the  normal  model  we  calculate 
the  inverse  of  the  Fisher  information  matrix  (Amemiya  (1973)),  which  would  be 
an  approximation  for  the  random  censorship  case.  However,  for  the  BJ  estimator 
there  are  no  theoretical  results  and  Buckley  and  James  suggest  using  the 
approximation 

cov(§)  =  a2  {(X-X,l)T  A  (X-X11)'1}  (22) 

where  A  =  diag(S^.)  and  X0  has  elements  n^*  6  x^y  Equation  (22)  can 

also  be  used  to  approximate  the  covariance  matrix  of  the  simplified  estimator. 

3.  Simulations 

To  gain  some  insight  into  the  difference  between  the  Buckley- James  (BJ) , 
simplified  nonparametric  (NP)  and  maxinum  likelihood  (ML)  methods  simulation 
studies  were  carried  out.  In  each  case  1000  samples  of  site  50  were  drawn 
with  the  covariates  evenly  spaced  at  2 1',  i= 1,...,  50.  Independent  life  times 
(v.)  and  censoring  times  (c.)  were  generated  for  each  covariate  and  Z.  = 

1  t  t 

minfy^.,  c^)  was  modelled.  The  Beta  distribution 
fM ' 

was  used  to  generate  the  censoring  times  because  a  wide  variety  of  censoring 
patterns  could  be  represented  by  varying  the  parameters,  p  and  q,  of  the 
distribution.  Symmetrical  distributions  are  generated  when  p=q  and  various 
degrees  and  patterns  of  asymmetry  can  be  generated  by  allowing  p  to  be 
unequal  to  q.  Each  of  the  tables  presents  the  average  percent  of  censoring, 
means  of  the  parameters,  MSE  of  the  parameters  and  the  percentage  of  runs 
which  did  not  converge.  This  percentage  was  computed  using  the  total 
number  of  samples  required  to  achieve  1000  convergent  samples. 


-8- 

Table  1  presents  the  results  for  a  normally  distributed  lifetime  with 
equal  censoring,  that  is,  the  percentage  of  censoring  is  the  same  at  each  de¬ 
sign  point.  This  case  is  of  interest  because  the  BJ  estimator  of  o  is  thought 
to  be  well-behaved  under  this  assumption;  however,  wc  found  that  in  general 
the  BJ  estimator  had  the  largest  bias  and  MSE  of  all  estimators  of  a  con¬ 
sidered.  The  best  estimator  of  a  was  the  new  estimator  which  is  defined 
by  equation  (20).  This  estimator  uses  the  information  from  the  BJ  results 
and  makes  full  use  of  the  censored  data  so  that  in  general  it  has  a  smaller 
bias  and  MSE  than  the  other  estimators  and,  in  particular,  this  estimator 
always  behaves  better  than  the  BJ  estimator.  Both  the  BJ  and  NP  estimators 
underestimated  a  but  the  NP  estimator  was  less  biased  than  the  BJ  ind  was 
often  less  biased  than  the  ML  estimator  of  o.  The  results  for  all  three 
methods  of  estimating  0Q  and  8-^  were  very  similar  with  all  methods  being 
biased  for  the  estimation  of  8Q.  While  the  ML  estimator  performed  better  for 
the  estimation  of  8Q  under  heavy  censoring  (p=l,  q=2) ,  the  BJ  estimator  was 
superior  for  the  estimation  of  8^  in  this  case.  Hie  percentage  of  non- 
convergent  samples  was  similar  in  most  of  the  cases  with  the  ML  method 
having  the  most  problems  in  the  case  of  heavy  censoring. 

Table  2  presents  the  results  for  a  normally  distributed  lifetime  with 
increasing  censoring.  In  general  the  simulations  show  that  the  ML  estimate 
tends  to  overestimate  a  while  the  BJ,  the  NP  and  the  new  estimators  tend  to 
underestimate  o.  The  NP  estimator  also  tends  to  underestimate  6-^;  however, 
it  has  the  smallest  MSE  in  all  the  cases  considered.  Both  the  NP  and  ML 
estimators  of  a  tend  to  be  better  than  the  BJ  estimator;  however,  the  new 
estimator  is  generally  superior  to  all  the  others  in  terms  of  bias  and  MSE. 

The  results  for  the  estimation  of  8Q  and  8j  tend  to  be  very  similar  with 
the  ML  estimator  doing  very  poorly  in  the  case  of  heavy  censoring  (p=l ,  q»2) . 


-9- 


Qnce  again  all  the  methods  are  biased  for  the  estimation  of  0Q  with  the 
BJ  performing  the  worst.  As  before,  although  the  NP  estimator  of  ^  generally 
has  a  smaller  MSE  it  is  always  biased  downward. 

Table  3  contains  simulations  where  a  beta  distribution  was  used  to 
generate  both  the  lifetime  and  censoring  distributions.  The  simulations 
were  performed  to  explore  the  robustness  of  the  ML  method  to  departures 
from  normality  and  to  compare  the  behavior  of  the  parametric  and  nonparametric 
methods  in  this  setting.  In  general,  the  new  estimator  of  o  had  the  smallest 
MSE  and  bias.  In  this  instance  the  BJ  estimator  of  a  behaved  very  poorly. 

For  the  symmetrical  (p=q)  lifetime  distributions  the  ML  estimator  of  tended 
to  perform  somewhat  better  than  either  of  the  other  two,  however,  it  was  not 
consistently  superior  to  the  other  estimators.  The  BJ  estimator  of  BQ  tended 
to  be  the  least  biased  of  the  methods  and  often  had  the  smallest  MSE.  For 
the  case  of  the  asymmetric  lifetime  distributions  the  BJ  estimator  of  Bj  was 
the  least  biased  of  the  three  estimators  but  had  the  largest  MSE,  with  the 
same  results  holding  for  3Q.  Thus,  the  violation  of  the  distributional 
assumption,  although  it  does  have  seme  impact,  did  not  seem  to  seriously 
affect  the  bias  and  MSE  of  the  ML  estimator. 
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4.  Stanford  Heart  Transplant  Example 


In  order  to  further  explore  the  relationship  between  the  BJ,  ML  and  NP 
estimators  and  to  see  how  our  proposed  estimator  of  a  for  the  BJ  method 
per fonneti  in  a  set  of  data,  we  analyzed  the  Stanford  heart  transplant  data 
given  in  Miller  and  Halpetin(1982) .  We  modelled  survival  time  (log  10) 
as  a  function  of  age  for  survival  times  greater  than  10  days.  These  results, 
given  in  table  4,  follow  the  pattern  of  the  previous  simulations  with  the 
new  estimator  (20)  giving  a  larger  estimate  of  o  than  the  BJ  estimator.  The 
ML  estimate  of  the  slope  8  is  almost  identical  with  the  BJ  estimate.  The 
NP  estimator  results  in  a  somewhat  smaller  estimate  of  8Q  and  6^  than  the 
ML  and  BJ  estimators. 


A  A 


6o 

8i 

0 

MLE 

3.826 

-.02453 

.890 

BJ 

3.745 

-.02434 

.655 

(.761)* 

NP 

3.645 

-.0229 

.751 

Table  4:  Estimators  for  the  heart  transplant  data. 
*new  estimator  (20)  of  o 
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5.  Conclusions 

TVpically,  all  three  methods  for  linear  regression  with  censored  data 
were  very  similar  with  no  particular  estimator  of  slope  appearing  superior. 
However,  we  did  find  that  the  new  estimator  of  o  enhanced  the  BJ  procedure 
and  would  cause  any  tests  of  significance  for  this  model  to  be  less  anti¬ 
conservative  so  that  we  would  reconmend  the  use  of  this  new  estimator.  We 
also  found  that  the  normal  model  tended  to  be  fairly  robust  against  departures 
from  normality  and  in  general  performed  quite  well. 


Helmut  Schneider’s  work  was  supported  by  the  Deutsche  Forschungsgemeinschaft 
and  the  Air  Force  Office  of  Scientific  Research. 

Lisa  Weissfeld's  work  was  partially  supported  by  National  Heart,  Lung  and 
Blood  Institute  contract  N01-HV-12243-L. 


REFERENCES 


AITKEN,  M.  (1981).  A  note  on  the  regression  analysis  of  censored  data. 
Technometrics.  23,  161-163. 

AMBfIYA,  T.  (1973).  Regression  analysis  when  the  dependent  variable  is 

truncated  normal.  Econometrics ,  41,  997-1016. 


BUCKLEY,  J.  and  JAMES,  J.  (1979).  Linear  regression  with  censored  data. 
Biometrika,  66,  429-436. 

COX,  D.R.  (1975).  Partial  likelihood.  Biometrika,  62,  269-76. 

COX,  D.R.  (1972).  Regression  models  and  life  tables  (with  discussion). 

J.  Roy.  Statist.  Soc. ,  B34,  187-202. 

DEMPSTER,  A.P. ,  LAIRD,  N.M.  and  RUBIN,  D.B.  (1977),  Maximum  likelihood  from 
incomplete  data  via  the  EM  algorithm  (with  discussion) .  J.  Roy.  Statist. 
Soc. ,  B39,  1-22. 

MILLER,  R.  and  HALPERIN,  J.  (1982).  Regression  with  censored  data. 

Biometrika ,  69,  521-531. 

SOMEE,J.  and  HAHN,  G.J.  (1979).  A  simple  method  for  regression  analysis 
with  censored  data.  Technometrics,  21,  417-432. 


I 


END 

* 

FILMED 

12-84 

v  ■  v/  '  '  ^ 

■‘>ts  _  '  • 

DTIC 


